Quantum Monte Carlo: June 2011

The past two posts (part I and part II) described a system (or style) for scientific programming that I've been working out. This post will have more in a Q&A format (The questions are mostly ones I ask myself - perhaps it should better be called a 'rhetoric & rant' format) An abstract for a talk and paper was accepted to SciPy 2011, so hopefully I can explain it clearly enough by then to get some feedback.

What is the target application? Goals?

The target goals are better enabling algorithm research and development (that is, writing a scientific code), and a flexibility to target different runtime systems.

Looking at the examples and case studies on various computer algebra system (CAS) websites, I see many cases where the user modeled some system, solved it, and then used the solution to improve or fix the system. The hard part seems to be modeling the system - this is not my target. I'm interested in problems where the second step is the hard part - and one needs to research new and better solvers.
I'm thinking in terms of a traditional scientific code, and how to automate parts of the development workflow. One big issue is how to use a CAS to assist in the development. Also I'm interested in problems that require high performance computing (HPC) - basically problems that will consume as much computing power and algorithm research as possible. This likely means the code will need to be tuned and altered to support different target systems (multicore workstations, distributed memory parallel clusters, accelerators like GPU's, etc).
Why not use a compute algebra system directly? It looks like you're creating a lot of extra work for yourself?
I would like a separation between the specification of an algorithm (for scientific computation, this is largely expressed in equations) and its implementation in the runtime system (along with an automated transformation from the spec to the final system).

Typically a CAS is associated with full featured programming language, but implementing the algorithm in that language doesn't preserve this separation.

As a concrete example, a CAS typically includes facilities for performing numerical integration. What if I want to develop a new numerical integration method - are there any examples on how a CAS can be used to do that?

Another issue with implementing HPC-style problems in a CAS is what happens when the existing solvers in the CAS are too slow, or the CAS system language is too slow (eg, how to scale to large parallel systems). The CAS may do a great deal of work for you, but what happens when you want to move past it? This is the part where code generation comes in.
What is the role of code generation in this system?
More generally, the 'code generation' step is the 'transformation to the target runtime system' step. The transformation may be trivial, if the final equations can be solved by the CAS facilities, but I'm anticipating the need to ultimately target C/C++ or Fortran.
Also, it seems useful to introduce this method in small pieces into an existing code (converting an entire code would usually be too dramatic of a change to be successful). It would smooth the transition for the generated code to fit the style of the existing code. A template system for generation would be a useful approach. A text-oriented (or basic token oriented system, like the C pre-processor) would work, but they have problems. A system oriented around an actual parse tree of the target language would allow the most powerful transformations.

Is this related to reproducible research in any way?
Packages for reproducible research seem to be focused on the level of capturing parameters that are input to a scientific code, so that the results can be repeated and revisited later.
Executing a program may be reproducible, but the construction of that program is not.

Aside:
Reproducibility can be achieved through documentation or automation. The previous standard for scientific work depended primarily on documentation. With computers, automation becomes possible as a working method. (For space reasons, this analysis is very simplistic --- there is more to it.)
</aside>

This work is intended to make the construction of a code automated. One advantage is in making corrections to mistakes in the derivation - fix it, push a button, and the final code is automatically updated.

This should be a complementary to other systems for reproducible research.
Why are you doing this?
I'm tired of tracking down missing minus signs and misplaced factors of 2 in my code. A significant feature of this approach is that the entire chain of derivation and generation of the program is performed by machine, to eliminate transcription errors. (If this sounds too rosy, rest assured there are plenty of other types of errors to make).

Quantum Monte Carlo

Tuesday, June 07, 2011

Modeling derivations: Questions and Answers

Blog Archive